independent variable: age in years (years)
dependent variable: (variabile)
using the classical linear predictor
what we dont see it bc its a default parameter but its actually hidden in our code:
the model uses family gaussian and the identity link function
link function in GLMs transforms (re-map) the linear predictor X
to the appropriate range of the response variable Y
independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
using the classical linear predictor
i nuovi dati simulati dal modello vanno chiaramente fuori dal range (0,1) di possibili valori per l’accuratezza
IN THE FIRST EXAMPLE an identity link was appropriate bc
boh) spans from -inf to +infhere an identity link is NOT appropriate bc
accuracy) spans from 0 to 1in this case, link="logit" makes sure that y spans from 0 and 1
independent variable: age in years (years)
dependent variable: mistakes in a TRUE/FALSE task (accuracy)
adding a new main effect: groups
normal kids (group = 0)
kids with dyslexia (group = 1)
a positive interaction emerges
Call:
glm(formula = accuracy ~ age * group, data = d)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.529062 0.016916 31.28 <2e-16 ***
age 0.052541 0.002103 24.99 <2e-16 ***
group1 -0.566758 0.023871 -23.74 <2e-16 ***
age:group1 0.059790 0.002967 20.15 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.0030859)
Null deviance: 15.9775 on 999 degrees of freedom
Residual deviance: 3.0736 on 996 degrees of freedom
AIC: -2937
Number of Fisher Scoring iterations: 2
a negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="logit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "logit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.26482 0.32430 -28.568 < 2e-16 ***
age 1.69491 0.04842 35.006 < 2e-16 ***
group1 1.55052 0.36909 4.201 2.66e-05 ***
age:group1 -0.40870 0.05457 -7.490 6.90e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8812.26 on 999 degrees of freedom
Residual deviance: 957.55 on 996 degrees of freedom
AIC: 3033
Number of Fisher Scoring iterations: 5
k = 50
N = 1000
group = rbinom(N,1,.5)
age = runif(N,6,10)
eta = -6+1*age-1*group
probs = mafc.probit(.m = 2)$linkinv(eta)
accuracy = rbinom(n = N, size = k, prob = probs) / k
d = data.frame(
age = age,
age_c = age - mean(age),
accuracy = accuracy,
group = as.factor(group)
)
ggplot(d, aes(x = age, y = accuracy, color = group)) +
geom_point() +
scale_x_continuous(limits = c(6, 10), breaks = seq(6, 10, 1))non ho simulato un’interazione, quindi ENTRAMBI i modelli trovano un’interazione che non c’è.
let’s try out the multiple alternative forced choice (50% - bc of the true/false) probit link
no interaction emerges !!!! as it should
fit = glm(accuracy ~ age*group, data=d, family=binomial(link=mafc.probit(.m=2)), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = mafc.probit(.m = 2)),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.911744 0.209295 -28.246 < 2e-16 ***
age 0.987424 0.029927 32.994 < 2e-16 ***
group1 -1.074050 0.266644 -4.028 5.62e-05 ***
age:group1 0.006767 0.036919 0.183 0.855
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8429.14 on 999 degrees of freedom
Residual deviance: 757.35 on 996 degrees of freedom
AIC: 2783.1
Number of Fisher Scoring iterations: 6
link="identity"equal intervals on X correspond to equal intervals on Y
su x ed y metti i nomi delle variabili dell’esempio
link="logit"equal intervals on X correspond to equal ratios (NOT equal intervals) on Y
link=mafc.probit(2)equal intervals on X do NOT correspond to equal intervals on Y
Building a model means that we want to find the processo generativo dei dati which, diversamente dal mondo delle simulazioni, we could never know for sure
to do that we must make important decisions
choosing the more appropriate family of distributions to make sure that the new values of the vd im predicting lie within the bounds
choosing the more appropriate link function: otherwise it’s very likely you end up finding non linear effects (ie interactions) that are not there!
We’re conducting a systematic review concerning how often the wrong link functions are used in psychological research + they lead to finding a significant interaction
so far, quite often
Data simulation, code and presentation are available on GitHub at sitalaura/link-functions
Questions and feedbacks laura.sita@studenti.unipd.it
Domingue, B. W., Kanopka, K., Trejo, S., Rhemtulla, M., & Tucker-Drob, E. M. (2024). Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychological methods, 29(6), 1164.
Hardwicke, T. E., Thibault, R. T., Clarke, B., Moodie, N., Crüwell, S., Schiavone, S. R., Handcock, S. A., Nghiem, K. A., Mody, F., Eerola, T., et al. (2024). Prevalence of transparent research practices in psychology: A cross-sectional study of empirical articles published in 2022. Advances in Methods and Practices in Psychological Science, 7 (4), 25152459241283477.
Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong?. Journal of Experimental Social Psychology, 79, 328-348.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1), 156.
Special thanks to
a negative interaction emerges
fit = glm(accuracy ~ age*group, data=d, family=binomial(link="probit"), weights= rep(k, nrow(d)))
summary(fit)
Call:
glm(formula = accuracy ~ age * group, family = binomial(link = "probit"),
data = d, weights = rep(k, nrow(d)))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.21133 0.03018 73.280 < 2e-16 ***
age 0.81113 0.02295 35.337 < 2e-16 ***
group1 -0.79152 0.03400 -23.279 < 2e-16 ***
age:group1 -0.11299 0.02637 -4.285 1.83e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8812.26 on 999 degrees of freedom
Residual deviance: 853.32 on 996 degrees of freedom
AIC: 2928.7
Number of Fisher Scoring iterations: 6
Cognitive Science Arena 2026